Mapping phenotypic performance and novel SNPs for cold tolerance in tomato (Solanum lycopersicum) genotypes through GWAS and population genetics

The cold stress susceptibility of tomato (Solanum lycopersicum) curtails its cultivation, with significant impact in temperate regions and on cropping seasons. To unravel genomic regions responsible for cold stress resilience, a diverse set of fifty genotypes encompassing cultivated, wild species, and landraces were genotyped using genotyping-by-sequencing. Over two years and six trials employing both early and late sowing, these lines were evaluated. Illumina-based next-generation sequencing produced up to 3 million reads per sample from individually sequenced library pools. The Tassel pipeline yielded 10,802 variants, subsequently filtered to 3,854 SNPs for genome-wide association analysis (GWAS). Employing clustering methods (population structure) via TASSEL, SNPhylo, and Kinship matrix, the fifty genotypes clustered into four distinct gene pools. The GWAS for cold tolerance in tomato integrated key traits including yield. Using six independent phenotypic datasets representing various environments, the study identified 4,517 significant marker-trait associations for cold tolerance traits. Notably, pivotal variations (> 10%) in cold stress tolerance, particularly proline content, were linked to marker-trait associations. Additionally, 5,727 significant marker-trait associations for yield and yield-related traits were unveiled, shedding light on fruit yield and directly associated attributes. The investigation pinpointed 685 candidate genes across all examined traits, including 60 genes associated with biological processes within these genomic regions. Remarkably, 7 out of the 60 genes were directly linked to abiotic stress tolerance, functioning as stress-responsive genes either directly or indirectly. The identified genes, particularly those associated with stress response, could hold the key to enhancing cold tolerance and overall crop productivity in tomato cultivation. Supplementary Information The online version contains supplementary material available at 10.1186/s12863-024-01190-5.


Introduction
Tomato due to its versatility and comparatively short genome (950 Mb), is commonly used as representative plant species for determining the genetic basis of complex traits, providing an ideal basis for current "-omics" research and genome guided breeding and has been employed both in conventional and molecular genetics.Nutrionally, the tomatoes are designated as 'Protective Food' as they are high in lycopene, ascorbic acid and beta-carotene.Consumption of lycopene-rich diets has been linked to a lower risk of numerous malignancies, including prostate cancer and coronary heart diseases (Karppi et al., 2009).Chen et al., (2015) found that increased lycopene consumption/circulating concentration is linked to a decreased possibility of prostate cancer and this red pigment (lycopene) is currently regarded as the "world's most effective natural antioxidant", making it a valuable ingredient in commercial therapeutic formulations [1,2].
Tomatoes are extremely susceptible to chilling temperatures (0-12 °C), and many tomato-growing areas endure low (chilling) temperatures during the growing season, resulting in significant yield and colour reductions.Low temperature (cold stress) sensitivity of commercial varieties thus limits their geographic distribution, cultivation, growing period, or planting and harvesting times.Due to the prevalence of low temperatures, the number of cropping seasons is just limited to one only in temperate regions.Changes in ambient temperature have an unavoidable impact on biological processes.The rate of spontaneous and enzyme-catalysed chemical and physical reactions, the structure and molecular dynamics, and the strength of molecular contacts are all affected by temperature.Each of these side effects disrupts metabolism and cellular signal processing in some way.Due to direct suppression of metabolic activities as well as coldinduced osmosis (reduction of water uptake and cellular dehydration), the cold stress hinders plants from expressing their full genetic potential.Low temperature has a negative influence on tomato plant growth and development throughout its life cycle.It reduces germination [3], water status and photosynthesis affecting early phases of growth and development [4].In addition, the reproductive development is also severely disrupted at all stages.Cold stress results in homeotic floral changes and reduced fruit set due to poor pollen germination, resulting in a significant drop in fruit production.Cultivars that are cold-tolerant can yield two or three cropping seasons each year.This would improve the efficiency with which existing processing capacity and fresh market availability could be improved and utilized [5].
In the past fruit yield, plant habit, adaptability to machine harvesting in processing cultivars and features linked to fruit appearance for the fresh market such as firmness, colour have been the focus of the majority of breeding studies.However in recent times due to climate change tomato breeding goals are more focused towards long-term productivity in changing environmental conditions.To broaden Solanum lycopersicon's restricted genetic diversity, genetic variation from exotic germplasm collections is to be incorporated from other species like Solanum pimpinellifolium, Solanum chilense, Solanum peruvianum, Solanum habrochaites and Solanum pennellii.The first step includes studying the genetic diversity.Genomic revolution, during the last two decades, simplified understanding of the complex responses to biotic and abiotic stress in several crop plants [6,7].Since tomato species have spread across wide range of habitats of and are found in cold, dry, mesic and salty ecosystems at elevations above 3000 m MSL and therefore could be valuable for identifying and understanding the natural defence mechanism in Solanum species that allows them to deal with cold but not freezing conditions, known as cold acclimation.Plants use transcriptional, post-transcriptional and post-translational processes to modify their gene expression during cold acclimatization.Cold stress activates a variety of transduction pathways in response to environmental changes.Transcriptome studies of plants' responses to abiotic stress revealed that the plant's response to abiotic stress is mediated by many regulatory mechanisms.
Genome-wide association studies look at genetic variants across a variety's genome to discover if any of them are linked to a trait [8] and has evolved as an useful tool for investigating and linking the vast volumes of data on genome sequence variation with observable behavioural differences.In GWAS the sample size is always large making it inaccessible for the researchers by being expensive but one of the approaches can be to use a star-like design by including geographically distant accessions (Arthur and Ashley, 2014).This maximizes the genetic variance within the sample (Li et al., 2010).Therefore, we used a core collection of diverse accessions collected from various agroclimatic regions to generate a basic and comprehensive dataset for cold tolerance traits apart from yield and yield related traits.QTL and GWAS has been conducted in tomato for various traits [9].QTL studies for abiotic stresses have been conducted by Arms et al. (2015), Liu et al. (2016), Diouf et al. (2018) etc. in tomato.Also, three hundred eighty-eight suggestive association loci (including 126 significant loci) for ninety-two metabolic traits including nutrition and flavor-related loci were identified by genome-wide association study from various accessions in two different environments by Ye et al. (2019) [10].The genetic architecture underlying tomato yield-related traits has been studied through GWAS.Based on ∼4.4 million single nucleotide polymorphisms obtained from diverse accessions, a comprehensive genome-wide association study for twenty-seven 27 agronomic traits in tomato have also been conducted (Ye et al. (2020) [11].GWAS analysis of a collection of landraces and vintage was performed revealing that previously uncharacterized chromosomal regions were potentially involved in the expression of variable phenotypes by Rodriquez et al.But, a complete GWAS program with prime objective of cold tolerance has never been conducted.Therefore keeping in mind the lag this study was designed [12].

Plant material
The experimental material comprised of fifty genotypes, used for identifying genomic regions and candidate genes for cold tolerance, yield and yield related traits collected from different agro-climatic regions.The list of genotypes is presented in Table 1.The collection included lines released as cold tolerant varieties viz., Pusa Sheetal, varieties grown and well acclimatized in cold stressed areas viz., Polish Tomato (heirloom variety from Poland; vigorous plants set fruit well even in cool weather), Sub Arctic Plenty (these tomatoes were developed in Alberta for prairie climates and set fruit under cold conditions), Coldset (an heirloom variety from Canada; the seeds withstand low temperatures), Black Plum (Russian heirloom variety, it is a fairly hardy cultivar and produces fruits in cooler temperatures) Stupice (it is Czechoslovakian-bred vine; it can blossom at cooler temperatures and can grow in climates as cold as those found in Alaska), Glacier (very cold tolerant and may survive a light frost also)., various species viz., WIR-3957 (Solanum peruvianum), WIR-13,717 (Solanum lycopersicum var cerasiforme), WIR-13,706 (Solanum lycopersicum var cerasiforme), WIR-5032 (Solanum chilense), WIR-13,708 (Solanum lycopersicum var cerasiforme), IIHR-1939 (Solanum pimpinellifolium) and IIHR-2805 (Solanum peruvianum), local landrace viz., Local-1 and breeding lines already identified as cold tolerant viz., IARI-2 and IARI-3.

Phenotyping
In case of tomato, the optimal temperature 15 °C to 30 °C.Tomatoes are sensitive to cold stress and it has Experiments were conducted at experimental farm of the Division of Vegetable Science, SKUAST-K, Shalimar (34 o N latitude and 74.89 o E longitude, 1685 m above MSL).The population was evaluated in randomized block design.The spacing followed was 60 cm × 45 cm.All the individuals of the population were apportioned into a total of three blocks along replications.The maximum, minimum, and mean temperatures were recorded daily during the entire cropping season for both years.The mapping population was phenotyped for yield and yield related traits viz., days to emergence, seedling length at transplant (cm), number of flowers per truss, number of days to first fruit set, number of fruits per truss, number of days to first harvest, average fruit weight (g), fruit shape index, number of fruits per plant, fruit yield per plant (kg), number of primary branches, plant height (cm) and duration of harvest and cold tolerance traits viz., pollen viability (% (fluorescent microscope (LMI microscope ABE-UK)), malondialdehyde content (nmol gfw −1 [13]), proline content (µmol gfw −1 [14]), total leaf chlorophyll content (mg 100 −1 g [15]), ascorbic acid (mg 100 −1 g A.O.A.C (1984)), lycopene content (mg 100 −1 g [16]), total phenols (mg 100 −1 g (Malick and Singh (1980) and total soluble sugars (mg 100 −1 g [17]).

Analysis of variance, best linear unbiased prediction (BLUP), and heritability
The ANOVA for the genotypes was performed using Metan: An R package [18], for individual environments using the mixed model analysis.For each trait and environment, the analysis was performed considering entry and block (nested within replication) as random effects and replication as fixed effects.
Broad-sense heritability was calculated as H 2 = Vg/Vp.In GAPIT-R, the best linear unbiased predictors (BLUPs) of each genotype were calculated for each environment.The calculated BLUPs were then used in the GWAS analysis.

DNA extraction, genotyping and single-nucleotide polymorphism calling
DNA from 50 genotypes was isolated from leaf samples by using Xcelgen Plant gDNA Isolation Kit (XG2611-01) as per the protocol described.The DNA was eluted in 50 µl Nuclease-Free Water.The samples were quantified using Qubit Fluorometer.For determining A260/280 ratio, 1 µl of sample was loaded in Nanodrop8000 spectrophotometer.The quality of samples was checked in 0.8% Agarose gel electrophoresis along with Hind III marker for the presence of intact bands.The DNA from samples were digested using ApeK1 enzyme and the GBS library was prepared from the digested DNA fragments by ligating adaptors specific to the cut-site.The library pool was analyzed in Bioanalyzer 2100 (Agilent Technologies) using High Sensitivity (HS) DNA chip and then sequenced independently on Illumina platform to generate up to ~ 3 million reads/ sample.These barcodes tagged fastq files were analysed.TASSEL5 GBSv2 pipeline was used for identification of tags at cut sites and SNPs located across reference genome.Tomato reference genome (S_lycopersicum_chromosomes.4.00.fa.gz) employed was downloaded from solgenomics /(ftp://ftp.solgenomics.net/tomato_genome/assembly/build_4.00/).The fastq files were then aligned against the tomato reference genome using the Bowtie2 tool version 2.2.9.The .sam file created from the Bowtie2 aligner program was used through SAMToGBSdbPlugin.The alignment file was then processed by using the GBSv2 analysis pipeline for SNP calling and genotyping.

Population structure, kinship and linkage decay (LD) analysis
STRU CTU RE software (v.2.3.4).was used to assess the population genetic structure among 50 tomato genotypes [19] with admixture model.Population structure was estimated based on total SNP loci (3854 SNPs) and K from 1 to 10 with 10 independent runs for each K.To determine the probable number of clusters based on genotypes, the software parameters were set to 5000 burnin and 50,000 MCMC (Markov Chain Monte Carlo) iteration.Structure output was then subjected to structure harvester for identification of effective number of clusters using the Evanno test implement in STRU CTU RE HARVESTER (Earl 2012).The Principal component analysis was obtained using TASSEL (5.0) for determination of percentage of variation explained by top three principal components.For phylogenetic analysis the SNP data was imported to SNPhylo with following parameters;-m 0.05 -a 478 -A-M 0.5-b-B 192; where m is the minor allele frequency; a is the total number of autosomes, b is Performs; M Missing rate.Tassel was used for generation of Kinship matrix and to know the relatedness among 50 tomato genotypes based on shared alleles among individuals.Linkage Disequilibrium (LD) was estimated using TASSEL (5.0) and LD curve was fitting using a nonlinear model as described by [20].

Genome wide association analysis (GWAS)
The Genome-wide association analysis involves regressing each SNP separately on a given trait, adjusted for various confounding variables.After removing four controls, the genotypes were processed for GWAS analysis.TASSEL and GAPIT was used for GWAS analyses.SNPtrait association analysis was performed using Mixed Linear Model (MLM) [21] implemented in TASSEL that include correction for both population structure and kinship.GAPIT was also used for GWAS analysis, which is a Genome Association and Prediction Integrated Tool.
GAPIT is a package that is run in the R software environment.GAPIT's MLM (Mixed Linear Model) was used for GWAS analysis.Six phenotypic data sets representing each of six environments were used independently for genome wide association study.After GWAS analysis, SNPs showing association with particular trait at P-value < = 0.05 were considered as significant SNPs.The qqman R package [22] was used to plot Manhattan and quantile-quantile (QQ) plots.A 5% significance level was used to identify SNPs significantly associated with trait.

Identification of candidate genes
Functional annotation of the predicted underlying genes with significantly associated SNPs was performed using the Solanum lycopersicum SL4.0 genome browser.Nearest genes located upstream or downstream of the significant markers were considered.Gene models were blasted against Tomato Genome Proteins (ITAG release 4.0) to determine the gene annotation.

Phenotypic performance and genetic variability
The Analysis of variance performed on the traits using environment (Y), genotype (G) and genotype x environment (G x Y) interactions as effects of the model as    presented in Table 2 revealed highly significant differences among all the genotypes under study for all traits and environments except for the trait fruit shape index thereby indicating a good amount of variability in the present material.The perusal of Table 3 revealed that mean performance of genotypes for various traits under different environments showed large variation and also the range was high for almost all the traits under study indicating that wide variation existed in the population.

Population structure, distribution, principal component analysis, kinship matrix and linkage disequilibrium (LD)
Three million reads/sample were generated by sequencing using on Illumina platform.The raw sequence data were filtered to remove low-quality bases, adapter contamination and uncalled bases to produce high-quality sequence data.Then Tassel pipeline resulted in generation of 10,802 SNPSs after SNPS calling.After applying various quality-filtering parameters (MAF > = 0.05, MAC > = 10, Missing Data < = 50%) 3854 SNPSs were retained for downstream analysis.of 10,802 variants.Linkage decay was observed after 1.0 Mb distance (Fig. 1) that has a practical significance of identifying significant trait association even with a smaller number of markers.Using SNPs genotype file, the population structure within 50 genotypes was investigated.The ΔK method was used to infer the correct number of subpopulations.The ΔK method takes the rate of change of the mean probability values (LnP) of each subpopulation into consideration.The structural analysis led to the identification of four (K = 4) genetically distinct subpopulations.The rate of change was maximum (159.83) at K = 4 (Fig. 2); therefore, we considered four subpopulations in our population of 50 tomato genotypes.
For the visualization of population structure, admixture analysis and population distribution at optimum K value of 4, bar plot was generated (Fig. 3).Then, based on the admixture coefficient obtained from STRU CTU RE, number of samples falling under each population was determined.Genotypes with membership probabilities higher than 0.6 were assigned to one of the subpopulations (Table 4) whereas remaining were considered as admixed.
These four subpopulations (Fig. 4) possessed an uneven distribution of genotypes with P2 receiving the maximum genotypes equal to 37. P3, P1 and P4 received 5, 4, 2 genotypes respectively and the population consisted of 2 admixtures.

Genome wide association study
Then Tassel pipeline resulted in generation of 10,802 variants on aligning against the tomato reference genome.These variants were filtered to retain only SNPs which were to be further used for downstream analysis GWAS analysis.After filtration, 3854 SNPs were retained.SNPs were not distributed evenly across all chromosomes.The top significant marker trait associations are presented in Table 5.

Marker trait association for cold tolerance traits Pollen viability
A total of 265 significant SNPs were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 119 significant SNPs were mapped.The mapping of significant SNPs was onto chromosomes 1, 2, 8, 9, 11 and 12 mostly.CH09_32765160, CH09_32765161 and CH09_32765231 were mapped across both the protected environments (E2 and E6) in both years explaining major variation of > 10% with additive effect.CH08_9512860 was mapped across both the normal environments (E1 and E4) in both years explaining a major phenotypic variation of > 10% (PVE) with negative effect.CH08_9396204 was mapped across both the cold stressed environments (E3 and E5) with explaining a major phenotypic variation of > 10% with additive effect.

Malondialdehyde
A total of 728 significant SNPS were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 238 significant SNPs were mapped.The mapping of significant SNPs was mostly onto chromosomes 1, 6, 2, 8, 9 and 11.CH08_9395919 and CH02_22999374, were mapped across both the normal environments (E1 and E4) in both years explaining a major phenotypic variation of > 10% (PVE) with negative effect which is desirable for the stated trait.CH08_9513555, CH08_9513565 and CH02_22989441 were mapped across both the protected environments (E2 and E6) in both years explaining major variation of > 10% with negative effect.CH08_9292940 was mapped across both the cold stressed environments with additive variance explaining a minor phenotypic variation of > 10% with negative effect.

Total leaf chlorophyll content
A total of 163 significant SNPs were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 35 significant SNPs were mapped.The mapping of significant SNPS for total leaf chlorophyll content was onto chromosomes 1, 2, 3, 8, 9, 10 and 11 mostly.CH08_9408610, CH09_32797168, CH09_32797221 and CH09_32797222 were mapped across both the normal environments (E1 and E4) in both years explaining a minor phenotypic variation of < 10% (PVE) with additive effect.CH09_32800680, CH09_32800681 and CH09_32797168 were mapped across both the protected environments (E2 and E6) in both years explaining major variation of > 10% with additive effect.

Ascorbic acid
A total of 792 significant SNPS were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 236 significant SNPS were mapped.The mapping of significant SNPs was mostly onto chromosomes 2, 3, 9, 10 and 11.Some of the top significant SNPs across environments overlapped, indicating some major effect loci.CH11_51779471 was mapped across all the six environments explaining a major phenotypic variation of > 10% (PVE) with additive effect.CH10_64514613 and CH10_64514614 were mapped across both the normal environments (E1 and E4) in both years explaining minor phenotypic variation of < 10% with additive effects.CH02_4489 and CH09_32801508 were mapped across both the protected environments (E2 and E6) in both years explaining major variation of > 10%.CH11_51768254 was mapped across both the cold stressed environments with additive variance explaining a minor phenotypic variation of < 10%.

Lycopene content
A total of 799 significant SNPs were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 314 significant SNPS were mapped.The mapping of significant SNPs was onto chromosomes 2, 5, 6, 9 and 11 mostly.CH06_10446131 and CH06_10446132 were mapped across five environments except the late sown environment explaining a major phenotypic variation of > 10% (PVE) and additive effect.CH11_13438931 was mapped across both the normal environments (E1 and E4) in both years explaining a major phenotypic variation of > 10% (PVE) with additive effect.CH11_2043421 was mapped across both the protected environments (E2 and E6) in both years explaining major variation of > 10% with additive effect.

Total phenols
A total of 643 significant SNPs were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 263 significant SNPs were mapped.The mapping of significant SNPs was onto chromosome 2, 3, 6, 7, 8, 9 and 11 mostly.CH06_8656494, CH06_8656496 and CH07_67281761 were mapped across both the normal environments (E1 and E4) in both years explaining a minor phenotypic variation of < 10% (PVE) with additive effect.CH03_542789 and CH09_32765930 were mapped across both the protected environments (E2 and E6) in both years explaining major variation of > 10% with additive effect.

Total soluble sugars
A total of 370 significant SNPs were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 206 significant SNPS were mapped.The mapping of significant SNPs was onto chromosomes 2, 6, 8, 9 and 11 mostly.CH08_9452288, CH08_9452289 and CH06_43801072 were mapped across both the normal environments (E1 and E4) in both years explaining a minor phenotypic variation of < 10% (PVE) with additive effect.CH02_22990072 and CH11_13438935 were mapped across both the protected environments (E2 and E6) in both years explaining minor phenotypic variation of < 10% with additive effect.

Marker trait association for yield and yield related traits Days to emergence
A total of 456 significant SNPs were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 85 significant SNPS were mapped.The mapping of significant SNPs was onto chromosomes 1, 2, 6, 7, 8 9 and 11 mostly.CH09_32800889 and CH09_32800890 were mapped across both the normal environments (E1 and E4) in both years explaining a minor phenotypic variation of < 10% (PVE) with negative effect desirable for the trait.CH06_11575381, CH06_11575394, CH06_11575453 and CH06_8639002 were mapped across both the protected environments (E2 and E6) in both years explaining minor phenotypic variation of < 10% with negative effect.

Seedling length at transplant
A total of 702 significant SNPs were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 309 significant SNPS were mapped.The mapping of significant SNPs for this trait was onto chromosomes 4, 5, 6, 8, 9 and 11 mostly.CH05_26702700, CH05_26702702 and CH08_9396204 were mapped across both the normal environments (E1 and E4) in both years explaining a major phenotypic variation of > 10% (PVE) with additive effect.

Number of flowers per truss
A total of 187 significant SNPs were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 71 significant SNPS were mapped.The mapping of significant SNPs was onto chromosome 3, 8, 9, 11 and 12 mostly.CH03_2152187, CH08_9512856, CH09_32801176, CH09_32801177 and CH09_32801178 was mapped across both the protected environments (E2 and E6) in both years explaining major phenotypic variation of > 10% with additive effect.

Number of days to first fruit set
A total of 365 significant SNPs were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 25 significant SNPs were mapped.The mapping of significant SNPS was onto chromosomes 2, 6, 8, 9, 11 and 12 mostly.CH09_32796951, CH09 32,796,992 and CH09_32797021 were mapped across both the normal environments (E1 and E4) in both years explaining a major phenotypic variation of > 10% (PVE) with negative effect desirable for the trait.

Number of fruits per truss
A total of 362 significant SNPs were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 31 significant SNPs were mapped.The mapping of significant SNPS was onto chromosome 1, 3, 7, 9, 11 and 12 mostly.CH12_49436823 was mapped across both the protected environments (E2 and E6) in both years explaining minor phenotypic variation of < 10% with additive effect.

Number of days to first harvest
A total of 265 significant SNPs were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 35 significant SNPS were mapped.The mapping of significant SNPs was onto chromosomes 1, 2, 3, 8, 9 and 11 mostly.CH11_13438926 was mapped across both the protected environments (E2 and E6) in both years explaining major phenotypic variation of > 10%.

Average fruit weight
A total of 1085 significant SNPs were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 346 significant SNPS were mapped.The mapping of significant SNPs was onto chromosomes 7, 8, 9 and 11 mostly.CH09_32800889, CH09_32800890, CH11_51768254 and CH11_51768333 were mapped across all the environments except the late sown environment in both years explaining minor phenotypic variation of < 10% with negative effect.

Fruit shape index
A total of 793 significant SNPS were mapped for this trait under all the six environments.

Discussion
Every degree increase in ambient temperature, in the context of climate change, has a significant impact on crop yield, particularly for tomato, which is exceptionally sensitive to cold temperatures.Tomato cultivar's low temperature sensitivity limits their geographic distribution, cultivation and growing season.Therefore, understanding the nature, impact, and molecular mechanisms of cold stress tolerance will help in designing strategies to overcome production losses.Previously, no studies have been reported on GWAS for cold tolerance in tomato.Studies were more focused on understanding the nature, impact, and existing diversity in germplasm lines as well as identifying the genomic regions responsible for yield and quality traits.In this study, we reported some marker trait associations for cold stress tolerance in tomato.The analysis of variance revealed highly significant differences among all the genotypes under study for all traits and environments thereby indicating a good amount of variability in the present material.The mean performance of genotypes for various traits under different environments showed large variation and also the range was high for almost all the traits under study indicating that wide variation existed in the population.Heritability in broad sense was found to be high for all the characters under study.High heritability for physiological traits indicated that the selection for cold tolerance relying on these traits could be effective.Its value ranged from 89 − 100%.Studies for varaiblity of germplasm were conducted by various workers viz., [23,24] and were of the same opinion that there was presence of sufficient variability in the tomato germplasm.
Based on STRU CTU RE, SNPhylo and Kinship matrix, the tomato (50 lines) germplasm belonging to different species was categorized into four distinct populations.These four subpopulations possessed an uneven distribution of genotypes with P2 receiving the maximum genotypes equal to 37. P3, P1 and P4 received 5, 4, 2 genotypes respectively and the population consisted of 2 admixtures.P2 consisted of majority of germplasm including exotic collections, land races, released varieties belonging majorly to species Solanum lycopersicum (31) and some of them to other species; Solanum pimpinellifolium (2), Solanum lycopersicum var.cerasiforme (3), Solanum peruvianum (1).The grouping of Solanum pimpinellifolium, in the same cluster with other Solanum lycopersicum lines was also reported by [25] and [26] as sister groups.Other subpopulation that is P1 consisted of 4 genotypes belonging to species Solanum peruvianum (1), a derivative of Solanum habrochites (1) and Solanum lycopersicum (2).The clustering of Solanum chilense and Solanum habrochaites together can possibly be due to their origin Southern Peru where they are mainly found [27].P3 consisted of 5 genotypes belonging to Solanum lycopersicum var.cerasiforme (2), Solanum chilense (1) and Solanum lycopersicum (2).The grouping of Solanum chilense and Solanum lycopersicum var.cerasiforme together can be attributed to the complementary properties they posses for salt-stress resistance and smaller fruit size as proved by [28].P4 consisted of 2 genotypes both belonging to Solanum lycopersicum.Admixtures consisted of 2 genotypes both of them being exotic collections belonging to species Solanum lycopersicum.The admixture level of first admixture obtained was higher in sub-population 1 and that of second admixture obtained was higher in sub-population 3 among 4 populations generated.The tendency of genotypes being distributed into various clusters by species indicated that the species were genetically unique.Clustering by origin region found that accessions with similar genetic similarity are also geographically similar.The distribution, on the other hand, lacked a distinct pattern.The genetic structure discovered in our study implies that selection for market specialisation has left a genetic trace in grown tomatoes.The observed variance also suggests that various patterns of genetic variation in farmed tomatoes are caused by wild species introgression and market specialised selection.Tomatoes (Solanum lycopersicum) have been subjected to extensive selection both during and after domestication.
Tomato has been a forerunner in QTL mapping for agronomic and yield-related traits on populations.Several genome wide association mapping investigations have been conducted in the previous decade, resulting in the discovery of novel loci for fruit quality, metabolites and flavor-related compounds.Efforts have also been made to explore major agronomic features in core sets of improved and wild cultivars.However, all the research was focused on yield and yield-related variables under normal conditions, resulting in the identification of several associated genomic areas, but no attempts were made to map loci under cold stress and protected environments.But this study, allowed us to identify marker trait associations for cold stress in tomato, since it was the first and one of its kind.In addition to reporting marker trait associations for cold tolerance traits, associations for yield and yield related traits under normal and cold stressed environments was also done which to the best of our knowledge, is the first comprehensive study.Cold stress impairs the plant system and therefore there are substances which act as markers for susceptibility or resistance.Pollen viability, total leaf chlorophyll content and lycopene act as such markers and are all reduced in response to cold stress in susceptible plants as a response to cold stress.Poor winter fruit set in tomato has been reported by drop in both pollen quality and quantity [29] where in cold stress alters anther's metabolic pathways, causing pollen sterility.Sharma KD. et al. [30] reported that unlike vulnerable plants, cold-tolerant plants produce a large amount of viable pollen.The mapping of significant SNPs was onto chromosomes 1, 2, 4, 8, 9, 11 and 12 mostly for pollen viability.Xu J. et al. [31] also mapped a single QTL for pollen viability on chromosome 11 in tomato.The gene Solyc04g082980.2annotated as Tetratricopeptide repeat protein SKI3 was identified for the trait.Rosado A. et al [32] reported that Tetratricopeptide repeat protein regulates the transcript levels of several dehydration-responsive genes, such as the transcription factor DREB2A and genes encoding dehydration response proteins, such as ERD1 (early response to dehydration 1), ERD3 and COR15a.Furthermore Solyc02g022930.3annotated as 3-hydroxyisobutyrate dehydrogenase was identified and was also colocalized for the trait malondialdehyde which is a marker of oxidative lipid injury.Schertl P. et al. [33] reported that under abiotic stress conditions in Arabidopsis, 3-hydroxyisobutyrate dehydrogenase acts as a NADH-generating enzyme of the branched chain amino acid degradation pathway.For total leaf chlorophyll content the mapping of significant SNPs for total leaf chlorophyll content was onto chromosomes 1, 2, 3, 8, 9, 10 and 11 mostly.Efrati A. et al. [34] mapped chlorophyllase on to chromosomes 6 (CHLASE 1) and chromosome 9 (CHLASE 2) in tomato lines.Gene Solyc11g017345.1 annotated as F-box/FBD/ LRR-repeat protein was identified.F-box proteins regulate diverse cellular processes, including cell cycle transition, transcriptional regulation and signal transduction, by playing roles in Skp1p-cullin-F-box protein (SCF) complexes or non-SCF complexes [35].At least 43 F-box protein-encoding genes have been found to be differentially expressed in rice seedlings subjected to different abiotic stress conditions [36].It was also colocalized for the trait fruit shape index indicating that if one trait is introgressed other one will simultaneously improve which are key for achieving resilience to cold stress tolerance.Also lycopene is one of the most important carotenoid contributing for 98% of the pigment of tomato fruit colour.In rice seedlings [37] reported that carotenogenic gene mutations lead to increased oxidative stress in rice seedlings and verified the anti-oxidative function of carotenoids.The mapping of significant SNPs for lycopene was onto chromosomes 2, 5, 6, 8, 9, 10 and 11 mostly.Zhang J. et al. [38] in tomato detected significant marker trait associations for carotenoid-derived volatiles on chromosomes 6, 9 and 11 explaining > 10% phenotypic variation.Gene named Thioredoxin was identified.da Fonseca-Pereira P. et al. [39] reported that in Arabidopsis, Thioredoxin transcripts were more highly expressed under drought and produce an effect that is stronger during repetitive drought/recovery events.
Apart from reduced content of few compound's there are certain compounds which accumulate in response to cold stress in tolerant plants and some which act as protectants.One of the most important amino acid known to impart stress tolerance is proline.A link between stress tolerance acquisition and proline accumulation has been established by [40].Highly significant marker trait associations were discovered for proline content.For proline content the mapping of significant SNPs was onto chromosome 1, 2, 8 and 11 mostly.Sauvage C. et al. [41] also reported significant association for proline content on chromosome 2 in tomato.Similarly chillinginduced oxidative damage in tomatoes was reported to be mitigated by ascorbic acid [42].It works by reducing electrolyte loss, lipid peroxidation and hydrogen peroxide levels.For ascorbic acid the mapping of significant SNPs was mostly onto chromosomes 2, 3, 9, 10, 11 and 12 mostly.Stevens R. et al. [43] identified common regions controlling ascorbic acid content on chromosomes, 9 and 10 in tomato and on chromosomes 9 and 11, [41] discovered significant associations for ascorbic acid content in tomato.Also, [44] in their research linked one locus to a previously identified ascorbic acid content large-effect QTL on chromosome 9 by [43] in tomato.Significant marker trait associations were detected on chromosome 3, 11 in tomato by [45].Gene Solyc03g007550.4annotated as LIM domain protein was identified for the trait.Park CJ et al. [46] reported that LIM genes have been involved in resistance against biotic and abiotic stresses in Brassica.It was also colocalized for the traits viz., number of fruits per plant thereby indicating an important correlation with fruit yield improvement under cold stressed conditions.Also colocalization with the traits viz., number of fruits per truss, number of fruits per plant, fruit yield per plant under normal and protected conditions indicates that simultaneous improvement can be done for many traits.
Compounds such as phenols and sugars protect plant cells against damage induced by cold stress in a variety of ways, including functioning as osmoprotectants, nutrients and interacting with the lipid bilayer [47].Phenolic substances which are non-enzymatic antioxidants (Phenolic acids and flavonoids) can trap and scavange reactive oxygen species (ROS).For total phenols the mapping of significant SNPs was onto chromosome 1, 2, 3, 6, 7, 8, 9, 10 and 11 mostly.Ruggieri V. et al. [48] reported that phenol content was associated with markers on chromosomes 8 and chromosome 11 in tomato.The mapping of significant SNPs for total soluble sugars was onto chromosomes 1,2,3,6, 8, 9 and 11 mostly.Similar significant marker trait associations for soluble solids on chromosomes 2, 8 and 9 in tomato were detected by [45,49] also detected significant associations for soluble solids on chromosome 2 in tomato.Significant marker trait associations for sugars on chromosomes 1, 2, 3 and 11 in tomato by were detected by [50].Significant associations for soluble solid content on chromosomes 2, 8, 9 and 11 were detected by [50].Sauvage C. et al. [41] also reported significant associations for sucrose content on chromosomes 2 and rhamnose on chromosome 8 in tomato.
To study yield and yield related traits in response to cold stress is very important aspect of the study because yield is the ultimate goal of a breeder along with climate resilience.For seedling length at transplant the mapping of significant SNPs was onto chromosomes 4, 5,6,8,9 and 11 mostly.Similar results were also observed by [51] in tomato mapped QTLs on chromosomes 1, 4, 6, 9 and 11 for various seedling traits viz., for hypocotyl length on chromosome 6, shoot weight on chromosome 9 and total root size on chromosome 9 and 11.For number of days to first harvest the mapping of significant SNPs was onto chromosomes 1, 2, 3, 8, 9 and 11 mostly.Significant association was observed by [52] in tomato for number of days between planting and ripening date of first fruit on chromosome 2 and for number of days between flowering and red ripe stage on chromosome 2. Significant marker trait associations in tomato for time to ripen on chromosomes 1, 2, 3, 9 and 10 were also observed by [50] in tomato.
For number of flowers per truss the mapping of significant SNPs was onto chromosome 3, 8, 9, 11 and 12 mostly.Significant associations were mapped for flower number on the third inflorescence on chromosome 9 by [44] in tomato.Also another important gene Solyc11g039880.2annotated as Nucleoporin was identified.Dong CH et al. [53] reported that Nucleoporin (AtNUP160) in Arabidopsis was critical for the nucleocytoplasmic transport of mRNAs and it played important roles in plant growth and flowering time regulation and is required for cold stress tolerance.A defect in Nup160/ SAR1 was also found to sensitize plants to chilling stress and disrupt acquired freezing tolerance.This gene is important in context of flowering under cold stressed conditions.For number of fruits per truss the mapping of significant SNPs was onto chromosome 1, 3, 7, 9, 11 and 12 mostly.Significant associations were mapped for fruit number on the third truss on chromosome 9 by [44] in tomato.Solyc11g150146.1 annotated as Elongation factor 1-alpha was identified and it was colocalized for the traits seedling vigor index and duration of harvest.
For fruit yield per plant the mapping of significant SNPs was onto chromosomes 1, 3, 4, 6, 7, 8, 9, 10, 11 and 12 mostly.Phan NT et al. [54] reported significant marker trait associations for fruit height on chromosomes 1, 3, 4, 7, 8; for fruit width on chromosomes 1, 3, 6, 10 and 11and for fruit weight on chromosomes 1, 3, 4, 6, 10 and 11 in tomato.For average fruit weight the mapping of significant SNPs was onto chromosomes 7, 8, 9 and 11 mostly.Significant associations were found for fruit weight on chromosome 11 by [49] and [45] in tomato while on chromosomes 7 and 9 by [52].Candidate genes Solyc07g065840.2annotated as molecular chaperone Hsp90-2 was identified it was also colocalized for the trait fruit yield per plant.Park CJ et al. [46] reported that heat shock proteins (HSPs) functioning as molecular chaperones are the key components responsible for protein folding, assembly, translocation and degradation under stress conditions and in many normal cellular processes.Shirasu K. et al. [55] also reported that HSP90s also play essential roles in plant immunity.Fruit weight is one the most important traits contributing to fruit yield and colocalization indicates a higher scope of improvement of fruit yield.Also in case of number of fruits the mapping of significant SNPs was onto chromosomes 2, 3, 9, 10, 11 and 12 mostly.Significant marker trait associations for fruit number on chromosomes 2, 10, 11 and 12 were reported by [50] in tomato.For fruit shape index the mapping of significant SNPs was onto chromosomes 1, 2, 3, 4, 6, 8, 9 and 11 mostly.Significant associations were found for fruit shape on chromosome 1, 8 and 9 by [45], on chromosome 8 by [49] and on chromosome 1, 2, 3, 4, 6, 8, 9 and 11 by [56] in tomato.

Conclusion
A total of 4517 significant marker trait associations were revealed for cold tolerance traits which hampers the growth and development of the crop throughout the season.Also total of 5727 significant marker trait associations were revealed for yield and yield related traits uncovering important associations for fruit yield and directly contributing traits.685 candidate genes were identified among all traits under study.Based on functional categorization, 60 genes were found to be associated with biological processes in these genomic regions.7 among 60 were directly found to be related to abiotic stress tolerance and function directly or indirectly as stress responsive genes.This study could be used for breeding programme for developing high yielding hybrids.

Fig. 2
Fig.2Plot showing the ΔK values for cluster size K = 1 to K = 10.The estimation of ΔK is performed at 10 independent runs burnin = 50,000 and MCMC = 100,000.Optimum cluster was obtained at K4 757 significant SNPs were mapped for this trait under all the six environments.Under the cold stressed environments (E3 and E5) of open field conditions 275 significant SNPS were mapped.The mapping of significant SNPs was onto chromosome 8 and 11 mostly.CH08_9484802 and CH08_9484810 were mapped across both the cold stressed environments (E3 and E5) with explaining a major phenotypic variation of > 10% with the former having negative effect and latter having additive effect.CH08_9484802 (8.75*10 − 6), CH08_9484810 (8.75*10 − 6) and CH08_9528421 (7.65*10 − 5) were mapped across early cold environment with highly significant P-values.CH08_9456588 and CH08_9435068 were mapped across both the protected environments (E2 and E6) in both years explaining major variation of > 10% with additive effect.CH08_9435068 was mapped across both the normal environments (E1 and E4) in both years explaining a major phenotypic variation of > 10% (PVE) with additive effect.

Fig. 3 Fig. 4 Fig. 5 3 Fig. 6 A
Fig. 3 Bar Plots for K = 4 showing the population structure and genetic diversity present in each sample with their admixtures from 4 populations.The admixtures for each population in each sample shows the total proportion of each genotype present in the samples and their diversity

Table 1
List of Solanum genotypes used in the study

Table 2
Phenotypic Performance and Genetic Variability using environment (Y), genotype (G) and genotype x environment (G x Y) interactions

Table 3
Environment wise estimates of mean and range for different traits in tomato (Solanum spp.

Table 4
Population assigned based on admixture coefficient

Table 5
Top significant marker trait associations under different environments

Table 6
List of candidate genes identified S.